Populating the Cube with Data
Now you can process
actual data into your cube from the data source view. To do so, you
right-click the Comp Sales cube entry in the Solution Explorer and
choose the Process item or choose the Process icon for the cube in the
cube designer (second icon from the left in the cube designer). A
Process Cube dialog appears, with the object list of available cubes
to process. You select the Comp Sales cube (by highlighting it) and
then click the Run button to start the processing of data (see Figure 51.38). You can also see in Figure 33
that the Process Option defaults to Process Full. Other options here
vary depending on what part of the cube needs to be reprocessed (such
as when you have structure changes, data refreshes, incremental data
changes, so on).
A Process Progress
dialog appears as the processing begins. Remember that this data is the
dimension member values and the measure data values and has not been
aggregated up through a complete cube representation (at all levels in
the hierarchies). That will be done shortly, via the Aggregation Design
Wizard. You can actually use your cube right now, but browsing would be
challenging from a performance point of view.
Aggregating Data Within the Cube
The last step of creating
your OLAP cube is running through the Aggregation Design Wizard and
determining how best to represent and aggregate the data for your
users. This is point at which you must determine the optimal
aggregation levels and storage method for these aggregations (MOLAP,
HOLAP, or ROLAP) for the optimal performance of queries against the
cube.
You double-click the cube entry in the Solution Explorer (Comp Sales.cube)
to bring up the cube designer for your newly created cube. Then you
click the Partitions tab to see the current partition for Comp Sales. Figure 34
shows the default storage mode is MOLAP and that there is no
Aggregation Design for this cube yet. Just to the lower right of this
tab is the Storage Settings option, which shows the different storage
options possible for your partition, as shown in Figure 35.
You need to indicate what
type of storage mode and caching options you want for the partition
that will contain your aggregations. You want to optimize performance and don’t
need real-time refreshes of the data. For these reasons, you specify
the MOLAP (native SSAS storage) mode. Figure 35
shows this MOLAP specification in the Storage Settings dialog. This
dialog works as a sliding scale. You just need to make sure the slider
is positioned at the MOLAP storage option.
You also want to take
advantage of the proactive caching capabilities that come with SSAS.
You can activate this feature by clicking the Options button of this
dialog and then checking the Enable Proactive Caching check box at the
top of the Storage Options dialog that appears (see Figure 36). In addition, you use the option Update the Cache When Data Changes, as indicated in Figure 36 along with interval times for these refreshes.
A good rule of thumb is to
refresh the cache interval based on response requirements and the
volatility of the data from the data source views and whether the
changes will have a dramatic effect on the BI query results.
Now you can run through the
Aggregation Design Wizard to see whether you can optimize your
partition for querying. You simply go to the Aggregation Design tab for
this cube (from the cube designer) and choose the Design Aggregations
option (click the first icon in the Design Aggregations tab or
right-click within the Aggregation Design tab and choose Design
Aggregations). This launches the Design Aggregation Wizard.
First up is the dialog that
allows you to specify object counts of the total population of facts
and the number of values at each hierarchical level within each
dimension. If you know what the full extent of counts will be for your
cube, you can manually supply these count values in the Estimated Count
column (see Figure 37).
You typically do this when you have been able to load only a partial
amount of data or the data will grow quite rapidly over time. If you
are building a statically sized cube and have populated the data
already, you just click the Count button to tell the wizard to use the
actual data as the basis of the aggregation.
The next dialog optimizes
the storage, based on the level of aggregation. You can specify a
maximum storage approach (you create optimized storage based on the
amount of disk space you can allocate to the cube), tell the wizard to
simply optimize to achieve a certain percentage of performance gain
(for example, 50%, 80%), specify to start the aggregation design
process dynamically, and stop when you feel the cube is optimized
enough, or do no design aggregation at all. You really want to see the
design aggregation process happen. Remember that the higher the
performance you want, the more storage it will require (and the longer
it will take to reprocess the aggregations). As you can see in Figure 38,
you should select the I Click Stop option and stop the design
aggregation when the optimization level starts to level off (somewhere
between 75% to 88% optimization level). Any further optimization would
really just waste storage space.
When you are satisfied with the aggregation design, you simply click NEXT and name this design (the sample is named AggregationDesignPrimary, as you can see in Figure 39). You then assign this design aggregation to the partition to use in the Partition tab.
If your company has sales
transaction data for the past five years and 250 stores that sell an
average of 1,000 items per day, the fact table will have 456,500,000
rows. This is obviously a challenge in terms of disk space by itself,
without aggregation tables to go along with it. The control that SSAS
provides here is important in balancing storage and retrieval speed
(that is, performance versus size). Aggregations are built to optimize
rollup operations so
that higher levels of aggregation are easily derived from the existing
aggregations to satisfy broader queries. If a high degree of query
optimization weren’t possible due to limitations in storage space, SSAS
might choose to build aggregates of monthly or quarterly data only. If
a user queried the cube for yearly or multiyear data, those
aggregations would be created dynamically from the highest level of
pre-aggregated data. With disk storage becoming more and more
inexpensive and servers becoming more powerful, the tendency is to opt
for meeting performance gains. A recommended approach is to specify
between an 80% and 90% performance gain here.
You are now ready to
complete the Aggregation Design Wizard. The final step is to either
process this aggregation or save your results and process it later. You
should choose to process this aggregation now and then click Finish
(see Figure 40).
The Process Progress dialog appears immediately, and you get to watch
the full extent of the cube’s aggregation partitions being built (that
is, populated). Aggregation SQL queries are actually created under the
covers to populate all these aggregation levels (which are implementing
your design levels). It’s nice to have Microsoft dynamically create
these complex queries for this critical performance optimization step
so you don’t have to worry about it yourself.
When this step completes, you have a fully optimized cube that is ready for data browsing. Congratulations!